使用移动操纵器来整理家庭环境,在机器人技术中提出了各种挑战,例如适应大型现实世界的环境变化,以及在人类面前的安全和强大的部署。2021年9月举行的全球竞赛,对真正的家庭环境中的整理任务进行了基准测试,重要的是,对全面的系统性能进行了测试。对于此挑战,我们开发了整个家庭服务机器人系统,该机器人系统利用数据驱动的方法来适应众多的方法在执行过程中发生的边缘案例,而不是经典的手动预编程解决方案。在本文中,我们描述了提出的机器人系统的核心成分,包括视觉识别,对象操纵和运动计划。我们的机器人系统赢得了二等奖,验证了数据驱动的机器人系统在家庭环境中移动操作的有效性和潜力。
translated by 谷歌翻译
神经辐射场(NERF)是数据驱动3D重建中的流行方法。鉴于其简单性和高质量的渲染,正在开发许多NERF应用程序。但是,NERF的大量的速度很大。许多尝试如何加速NERF培训和推理,包括复杂的代码级优化和缓存,使用复杂的数据结构以及通过多任务和元学习的摊销。在这项工作中,我们通过NERF之前通过经典技术镜头重新审视NERF的基本构建块。我们提出了Voxel-Accelated Nerf(VaxnerF),与Visual Hull集成了Nerf,一种经典的3D重建技术,只需要每张图像的二进制前景背景像素标签。可视船体,可在大约10秒内优化,可以提供粗略的现场分离,以省略NERF中的大量网络评估。我们在流行的JAXNERF Codebase提供了一个干净的全力验光,基于JAX的实现,其仅包括大约30行的代码更改和模块化视觉船体子程序,并在高度表现的JAXNERF之上实现了大约2-8倍的速度学习基线具有零劣化呈现质量。具有足够的计算,这有效地将单位训练从小时到30分钟缩小到30分钟。我们希望VAXNERF - 一种仔细组合具有深入方法的经典技术(可谓更换它) - 可以赋予并加速新的NERF扩展和应用,以其简单,可移植性和可靠的性能收益。代码在https://github.com/naruya/vaxnerf提供。
translated by 谷歌翻译
Deep image prior (DIP) has recently attracted attention owing to its unsupervised positron emission tomography (PET) image reconstruction, which does not require any prior training dataset. In this paper, we present the first attempt to implement an end-to-end DIP-based fully 3D PET image reconstruction method that incorporates a forward-projection model into a loss function. To implement a practical fully 3D PET image reconstruction, which could not be performed due to a graphics processing unit memory limitation, we modify the DIP optimization to block-iteration and sequentially learn an ordered sequence of block sinograms. Furthermore, the relative difference penalty (RDP) term was added to the loss function to enhance the quantitative PET image accuracy. We evaluated our proposed method using Monte Carlo simulation with [$^{18}$F]FDG PET data of a human brain and a preclinical study on monkey brain [$^{18}$F]FDG PET data. The proposed method was compared with the maximum-likelihood expectation maximization (EM), maximum-a-posterior EM with RDP, and hybrid DIP-based PET reconstruction methods. The simulation results showed that the proposed method improved the PET image quality by reducing statistical noise and preserved a contrast of brain structures and inserted tumor compared with other algorithms. In the preclinical experiment, finer structures and better contrast recovery were obtained by the proposed method. This indicated that the proposed method can produce high-quality images without a prior training dataset. Thus, the proposed method is a key enabling technology for the straightforward and practical implementation of end-to-end DIP-based fully 3D PET image reconstruction.
translated by 谷歌翻译
The black-box nature of end-to-end speech translation (E2E ST) systems makes it difficult to understand how source language inputs are being mapped to the target language. To solve this problem, we would like to simultaneously generate automatic speech recognition (ASR) and ST predictions such that each source language word is explicitly mapped to a target language word. A major challenge arises from the fact that translation is a non-monotonic sequence transduction task due to word ordering differences between languages -- this clashes with the monotonic nature of ASR. Therefore, we propose to generate ST tokens out-of-order while remembering how to re-order them later. We achieve this by predicting a sequence of tuples consisting of a source word, the corresponding target words, and post-editing operations dictating the correct insertion points for the target word. We examine two variants of such operation sequences which enable generation of monotonic transcriptions and non-monotonic translations from the same speech input simultaneously. We apply our approach to offline and real-time streaming models, demonstrating that we can provide explainable translations without sacrificing quality or latency. In fact, the delayed re-ordering ability of our approach improves performance during streaming. As an added benefit, our method performs ASR and ST simultaneously, making it faster than using two separate systems to perform these tasks.
translated by 谷歌翻译
The heterogeneity gap problem is the main challenge in cross-modal retrieval. Because cross-modal data (e.g. audiovisual) have different distributions and representations that cannot be directly compared. To bridge the gap between audiovisual modalities, we learn a common subspace for them by utilizing the intrinsic correlation in the natural synchronization of audio-visual data with the aid of annotated labels. TNN-CCCA is the best audio-visual cross-modal retrieval (AV-CMR) model so far, but the model training is sensitive to hard negative samples when learning common subspace by applying triplet loss to predict the relative distance between inputs. In this paper, to reduce the interference of hard negative samples in representation learning, we propose a new AV-CMR model to optimize semantic features by directly predicting labels and then measuring the intrinsic correlation between audio-visual data using complete cross-triple loss. In particular, our model projects audio-visual features into label space by minimizing the distance between predicted label features after feature projection and ground label representations. Moreover, we adopt complete cross-triplet loss to optimize the predicted label features by leveraging the relationship between all possible similarity and dissimilarity semantic information across modalities. The extensive experimental results on two audio-visual double-checked datasets have shown an improvement of approximately 2.1% in terms of average MAP over the current state-of-the-art method TNN-CCCA for the AV-CMR task, which indicates the effectiveness of our proposed model.
translated by 谷歌翻译
扬声器在彼此保持一致的过程中建立了融洽的关系。在指导域材料的同时,已经证明了与教师的融洽关系,以促进学习。过去关于教育领域的词汇一致性的工作都在量化对齐方式的措施和与代理对齐的相互作用的类型中都遭受了限制。在本文中,我们采用基于数据驱动的共享表达式概念(可能由多个单词组成)的对齐措施,并比较一对一的人类机器人(H-R)相互作用的对齐方式与协作人类人类的H-R部分中的对齐方式-Orobot(H-H-R)相互作用。我们发现,H-R设置中的学生与H-H-R设置相比,与可教的机器人保持一致,并且词汇一致性和融洽关系之间的关系比以前的理论和经验工作所预测的要复杂。
translated by 谷歌翻译
联合学习(FL)是以分散的方式共同训练机器学习算法的范式。 FL中的大多数研究都集中在基于神经网络的方法上,但是,由于克服算法的迭代和添加性特征的挑战,在联合学习中基于XGBoost的方法(例如XGBOOST)在联合学习中没有得到反应。基于决策树的模型,尤其是XGBoost,可以处理非IID数据,这对于联合学习框架中使用的算法很重要,因为数据的基本特征是分散的,并且具有本质上非IID的风险。在本文中,我们专注于研究通过对各种基于样本量的数据偏斜方案进行实验以及这些模型在各种非IID方案下的性能,通过非IID分布的影响如何受到非IID分布的影响。我们在多个不同的数据集中进行了一组广泛的实验,并进行了不同的数据偏斜分区。我们的实验结果表明,尽管有各种分区比率,但模型的性能保持一致,并且与以集中式方式训练的模型接近或同样良好。
translated by 谷歌翻译
黑盒优化在许多应用中具有潜力,例如在实验设计中的机器学习和优化中的超参数优化。 ISING机器对二进制优化问题很有用,因为变量可以由Ising机器的单个二进制变量表示。但是,使用ISING机器的常规方法无法处理具有非二进制值的黑框优化问题。为了克服这一限制,我们通过与三种不同的整数编码方法合作,通过使用ISING/退火计算机和分解计算机来提出一种用于整数变量的黑盒优化问题的方法。使用不同的编码方法,使用一个简单的问题来计算最稳定状态下的氢分子能量,以不同的编码方法进行数值评估。提出的方法可以使用任何整数编码方法来计算能量。但是,单次编码对于小尺寸的问题很有用。
translated by 谷歌翻译
机器人进行深入增强学习(RL)的导航,在复杂的环境下实现了更高的性能,并且表现良好。同时,对深度RL模型的决策的解释成为更多自主机器人安全性和可靠性的关键问题。在本文中,我们提出了一种基于深入RL模型的注意力分支的视觉解释方法。我们将注意力分支与预先训练的深度RL模型联系起来,并通过以监督的学习方式使用受过训练的深度RL模型作为正确标签来训练注意力分支。由于注意力分支经过训练以输出与深RL模型相同的结果,因此获得的注意图与具有更高可解释性的代理作用相对应。机器人导航任务的实验结果表明,所提出的方法可以生成可解释的注意图以进行视觉解释。
translated by 谷歌翻译
我们提出了一个框架,该框架会自动将不可缩放的GNN转换为基于预典型的GNN,该GNN对于大型图表有效且可扩展。我们框架的优势是两倍。1)它通过将局部特征聚合与其图形卷积中的重量学习分开,2)通过将其边缘分解为小型图形,将其有效地在GPU上进行了预先执行,将各种局部特征聚合与重量学习分开,将各种局部特征聚合从重量学习中分离出来,从而使各种不可估计的GNN转换为大规模图表。和平衡的集合。通过大规模图的广泛实验,我们证明了转化的GNN在训练时间内的运行速度比现有的GNN更快,同时实现了最先进的GNN的竞争精度。因此,我们的转型框架为可伸缩GNN的未来研究提供了简单有效的基础。
translated by 谷歌翻译